FactCrawl: A Fact Retrieval Framework for Full-Text Indices
نویسندگان
چکیده
We present FactCrawl, a framework for retrieving structured, factual information leveraging the full-text index of a search engine. The framework applies an approximation algorithm to solve problem of retrieving all facts in a document collection using a minimal set of keywords while minimizing cost. The search engine is queried with automatically generated keywords, the results are re-ranked according to our fact score and documents are forwarded to a fact extractor.
منابع مشابه
Managing Large Scale Native RDF Semantic Repository from the Graph Model Perspective
We propose a set of solutions for managing a large scale RDF semantic repository from the perspective of RDF graph model. A native storage instead of relational database is used to hold RDF. Indices supporting regular path expression, full-text retrieval and partial OWL Lite inference are built above the storage model. Semantic ranking for resources are provided as well.
متن کاملThe Study on Lucene Based IETM Information Retrieval
With the intensive and large scale application of IETM in equipment integrated support, information retrieval technology becomes one of the most key technologies. This article discusses the full-text search technology and Lucene full-text retrieval engine, and combines them to develop a highperformance scalable IETM full-text retrieval system, this system can effectively deal with IETM unstruct...
متن کاملTHUIR at TREC 2005 Terabyte Track
IR group of Tsinghua University this year has used its TMiner text retrieval system for indexing and retrieval of the Terabyte track ad hoc and named-page subtasks. In doing the two tasks, we used the in-link anchor texts (the anchor of the URLs that point to the current page in the collection) together with the content texts of the web pages for building the indices. When retrieving, the word-...
متن کاملA Novel Hash-Based Streaming Scheme for Energy Efficient Full-Text Search in Wireless Data Broadcast
Full-Text Search is one of the most important and popular query types in document retrieval systems. With the development of The Fourth Generation Wireless Network (4G), wireless data broadcast has gained a lot of interest because of its scalability, flexibility, and energy efficiencies for wireless mobile computing. How to apply full-text search to documents transmitted through wireless commun...
متن کاملSASE: Implementation of a Compressed Text Search Engine
Keyword based search engines are the basic building block of text retrieval systems. Higher level systems like content sensitive search engines and knowledgebased systems still rely on keyword search as the underlying text retrieval mechanism. With the explosive growth in content, Internet and Intranet information repositories require efficient mechanisms to store as well as index data. In this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011